Whole-orbit radiomics: machine learning-based multi- and fused- region radiomics signatures for intravenous glucocorticoid response prediction in thyroid eye disease

Background Radiomics analysis of orbital magnetic resonance imaging (MRI) shows preliminary potential for intravenous glucocorticoid (IVGC) response prediction of thyroid eye disease (TED). The current region of interest segmentation contains only a single organ as extraocular muscles (EOMs). It would be of great value to consider all orbital soft tissues and construct a better prediction model. Methods In this retrospective study, we enrolled 127 patients with TED that received 4·5 g IVGC therapy and had complete follow-up examinations. Pre-treatment orbital T2-weighted imaging (T2WI) was acquired for all subjects. Using multi-organ segmentation (MOS) strategy, we contoured the EOMs, lacrimal gland (LG), orbital fat (OF), and optic nerve (ON), respectively. By fused-organ segmentation (FOS), we contoured the aforementioned structures as a cohesive unit. Whole-orbit radiomics (WOR) models consisting of a multi-regional radiomics (MRR) model and a fused-regional radiomics (FRR) model were further constructed using six machine learning (ML) algorithms. Results The support vector machine (SVM) classifier had the best performance on the MRR model (AUC = 0·961). The MRR model outperformed the single-regional radiomics (SRR) models (highest AUC = 0·766, XGBoost on EOMs, or LR on OF) and conventional semiquantitative imaging model (highest AUC = 0·760, NaiveBayes). The application of different ML algorithms for the comparison between the MRR model and the FRR model (highest AUC = 0·916, LR) led to different conclusions. Conclusions The WOR models achieved a satisfactory result in IVGC response prediction of TED. It would be beneficial to include more orbital structures and implement ML algorithms while constructing radiomics models. The selection of separate or overall segmentation of orbital soft tissues has not yet attained its final optimal result. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-023-04792-2.


Logistics Regression
Logistic regression is a linear regression model that assumes the data follows a Bernoulli distribution.It utilizes the maximum likelihood function and gradient descent method to estimate the parameters, achieving binary classification.The logistic regression model can be seen as a linear regression model that is normalized by the Sigmoid function (also known as the Logistic equation) [1].The Sigmoid function compresses the output of the linear regression (referred to as the logit) between [0, 1] and passes through the important point (0, 0.5).This compression ensures that the output is bounded between [0, 1], with a threshold of 0.5.Values greater than 0.5 are classified as one class, while values less than 0.5 are classified as the other class [2].During the model development, the tuning step involves the use of "C", "penalty", and "solver" parameters.

NaiveBayes
NaiveBayes is a popular and simple probabilistic machine learning algorithm used for classification tasks.It is based on Bayes' theorem and assumes independence among the features of the input data, hence the term "naive."[3] The algorithm calculates the probability of a sample belonging to each class and assigns it to the class with the highest probability [4].
Naive Bayes utilizes prior knowledge of the class distribution and the likelihood of observing certain features to make predictions.

Support Vector Machine
Support Vector Machine (SVM) is a powerful and widely used supervised machine learning algorithm for both classification and regression tasks [5].SVM is particularly effective in dealing with complex and high-dimensional datasets.
The fundamental principle of SVM is to find an optimal hyperplane that separates different classes in the feature space.It aims to maximize the margin between the classes, which is the distance between the hyperplane and the nearest data points from each class.These data points, known as support vectors, play a crucial role in defining the decision boundary [6].

ExtraTrees, Xgboost and LightGBM
Random forests are ensemble models consisting of multiple decision trees [7].They can be used for both classification and regression tasks, including multi-class classification.Random forests introduce nonlinearity and randomness into the model.
In classification, each decision tree predicts the test sample, and the final prediction is determined by majority voting.
Unlike traditional decision trees, random forests randomly select features from a subset during node splitting, instead of considering all features.This randomness slightly increases the bias but reduces the variance by averaging the predictions, resulting in an improved overall model.
The Extra-Trees (Extremely randomized trees) method is similar to random forests and often considered a variant of them [8].However, Extra-Trees exhibit even greater randomness in the splitting of decision tree nodes.Each decision tree in Extra-Trees directly uses a random feature and threshold for splits, leading to more diversity between submodels.This increased randomness helps suppress overfitting, as extreme data points have less influence due to the high variability among decision trees.By preventing overfitting, Extra-Trees reduce variance while potentially increasing bias.
LightGBM was introduced as an optimization over XGBoost, a popular gradient boosting decision tree (GBDT) tool [9].
LightGBM aims to accelerate GBDT model training without compromising accuracy through several optimizations.It utilizes a histogram-based decision tree algorithm and implements Gradient-based One-Side Sampling (GOSS) to reduce the number of instances with small gradients, saving time and space during information gain calculation.Exclusive Feature Bundling (EFB) is employed to bundle mutually exclusive features, reducing dimensionality [10].LightGBM adopts a Fig. 1 The flowchart of patient enrollment and scheme for analysis.

Fig. 2
Fig. 2 Performances of SIR models using six machine learning algorithms in the test cohort were evaluated and compared through ROC curves.